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Abstract 

The famous hypercontractive estimate discovered independently by Gross m, Bonami 
[1] and Beckner [T], has had great impact on combinatorics and theoretical computer sci¬ 
ence since first used in this setting in the seminal KKL paper m- The usual proofs of 
this inequality begin with the two-point space where some elementary calculus is used, 
and then generalise immediately by induction on the dimension, using submultiplicativity 
(Minkowski’s integral inequality). In this paper we prove the inequality using information 
theory. We compare the entropy of a pair of correlated vectors in {0,1}" to their separate 
entropies, analysing them bit by bit (not as a figure of speech, but as the bits are revealed), 
using the chain rule of entropy. 


1 Introduction 

The inequality that we consider in this note is a two-function version of a famous hypercontrac¬ 
tive inequality due, independently, to Gross [TO], Bonami |3] and Beckner [T]. This inequality, 
first introduced to the combinatorial landscape in the seminal KKL paper has become one 
of the cornerstones of the analytical approach to Boolean functions and theoretical computer 
science, see e.g. [2], [7j, [8], [TO],[TO], [TO], and many many others. See chapter 16 of [TO] for a 
historical background. 

Let e G (0,1). We will be considering an operator which acts on real valued functions 
on {0, !}”■. We consider two equivalent definitions of the operator; A spectral definition and 
a more probabilistic one. The first definition is via the eigenfunctions and eigenvalues of the 
operator. The eigenfunctions are precisely the Walsh-Fourier characters, {^^xjxefo,!,}" > which 
form a complete orthonormal system under the standard inner product on {0,1}”. We recall 
the definition of these characters. For X,Y & {0,1}” 

ux(Y} = (-l)^^^’^b 

Given a function / : {0,1}” M, and its unique Fourier expansion, / = X) f{X)ux the action 
of Te on / is defined by 

rq/) = ^eS^^/(X)ux. 
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This definition of Tg stresses the fact that it ’’focuses” on the low-frequency part of the Fourier 
spectrum, an idea that was a crucial element in the KKL proof m- 

For the other definition let X,Y be random variables taking values in {0,1}"'. Either for 
fixed X, or any distribution of X, let Y be such that for every coordinate i, independently, Yi 
is chosen so that Pr[Xi = Yi] = (Or, if one prefers the {—1,1}" setting, the restriction is 
E[XiYi] = e.) Note that if X is chosen uniformly, then the marginal distribution of Y is also 
uniform. We call such a pair {X,Y) an e-correlated pair. Then one can define 

Uf){X)=E[f{Y)]. 

It is not hard to verify that these two definitions of are equivalent. The second definition, 
which is the one we will be working with in this paper, stresses the connection of this operator 
to random walks and isoperimetric inequalities, as it enables one to bound the probability of an 
e-correlated pair of random points X, Y to belong to given sets. It also explains the fact (that 
will be made formal shortly) that T^{f) is smoother than /, as the operator is an averaging 
operator. Without further ado, here is the statement of the inequality. 

Theorem 1.1: [Gross,Bonami,BecknerJ Let f : {0,1}" ^ and let e G [0,1]. Then 

| r ,(/)|2 < |/| i +, 2 , ( 1 ) 

where, as usual, \g\p = E[|g(|P], and the expectation is with respect to the uniform measure. 

Remarks: 

• There are various refinements of this inequality either dealing with non-uniform measure, 

the norm of as an operator from to L*? for q ^ 2, studying products of a base 
space with more than two points, and also a reverse inequality that deals with the case 
p,q < 1. See It would not be surprising if the method of this note could 

be extended to cover such cases too. 

• It’s not difficult to see that ([I]) is equivalent to the following. Let f,g : (0,1}" —M, let 
X be uniformly distributed on {0,1}", and let X,Y be an e-correlated pair. Then 

E[f{X)g{Y)] < l/li+ebli+c (2) 

This is the inequality proven in this paper. 

• A major portion of the applications of the hypercontractive inequality deal with the case 
when / and g are Boolean functions. We will start our proof with this setting, and then 
show how a small variation deals with the general case. 

In this paper we apply an information-theoretic approach to proving ([2]) for Boolean functions, 
trying to analyse the pair {X, Y) as the coordinates of X and Y are revealed to us one by one. 
Since all known direct proofs of ([T]) use induction, it is not surprising that one should adopt such 
a sequential approach. The difference is that the usual proofs begin with the two point space. 
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and proceed by induction, using submultiplicativity of the product operator and Minkowski’s 
integral inequality, whereas we use the chain rule of entropy, exposing the bits of the vectors 
in question one by one, and comparing the amount of information of their joint distribution 
with the information captured by their marginal distributions. Fortunately it turns out that 
regardless of the prehxes revealed so far, at every step the conditional entropies obey the same 
inequality. 

This is not the first application of entropy to this hypercontractive inequality. In [9] the 
dual form of the inequality is proven for the case of comparing the 2-norm and the 4-norm of a 
low degree polynomial on {0,1}”. Blais and Tan, [3], managed to improve this approach and, 
surprisingly, extract the precise optimal hypercontractive constant for comparing the 2-norm 
and the g'-norm of such polynomials, for all positive even integers q. Both these proofs analyse 
the Fourier space rather than the primal space - and use no induction at all. 

One final remark regarding the proof in this paper. Although it is not difficult, it is probably, 
to date, the most involved proof of the inequality from a technical point of view. I believe that 
nonetheless it is worthwhile to add it to the list of existing proofs, because it offers a new point 
of view which directly addresses the notion of studying projections of the joint distribution of 
a pair of e-correlated vectors. 


2 Main Theorem 

2.1 The Boolean case 

Theorem 2.1: Let e G (0,1), and let X,y ^ {0,1}” he nonempty. Let X he uniformly 
distributed on {0,1}"', and let Y be such that for each 1 < i < n independently Pr[Xi = Yi] = 
. Then 

E[l;^iX)ly{Y)] < (^(T)/i(T))^ . (3) 

Proof: For X,Y & {0,1}” let a{X,Y) denote the number of coordinates on which X and Y 
agree, and d{X,Y) be the number of coordinates on which they differ. Then the theorem is 
equivalent (by straightforward manipulation) to 


‘°e(E E(1+ + + 

\xGXYey ) ^ 

where all logs are base 2. As usual in proofs using entropy, it suffices, by continuity, to treat 
the case where e is rational. Let s < r be natural numbers such that = —t—. 

Then (2) reduces to 

log ( ^ X] j < n(log(r -k s) - sjr) Y ^^^(logdA]) -k log(|T|)). (5) 

VxeA'YGy’ / ^ 
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We will express the left hand side of this expression as the entropy of a random variable, and 
proceed to expand it according to the chain rule. First let ^ooj ^lO) ^oi be four disjoint 
sets with 

l^ool = l^iil = r, l^oil = l^iol = s. 

Next let (X, Y, Z) be a random variable which is distributed uniformly over all triples such that 
X £ Y,Y £y, and for every 1 < f < n, Zj € AxiYi- Clearly Z determines X and Y so that 

H{X,Y,Z) = H{Z) = log f ^aiX,Y),d(X,Y)\ . 

\x&XYey / 

Next, for any vector W € {0,1}"' we denote iWi ,..., Wi-i) := W<j. So by the chain rule we 
have 

H{Z) = Y,H{Zi\Z<i) 
i 

and 

H{X) = Y,H{X,\X^,) < log{\S\),H{Y) = < log(|r|). 

i i 

Hence it suffices to prove 

H{Z) < iH{X) + H{Y)) + n(log(r + s) - s/r). (6) 

Noting that any fixed value of determines the values and Y’<i we have 

H{Xi\Z<i) < H{Xi\X<i), 

and 

H{Y,\Z<i) < H{Yi\Y<i), 

so inequality Q will follow from 

Claim 2.2: Denote a fixed := Past. Then for all fixed values of Past 

H{Zi\Past) < ^^ ^ {P[{Xi\Past) + PI(Yi\Past)) + log(r + s) - - 
2r r 

Proof: We condition on Past, and by abuse of notation drop the dependency in the notation, 
e.g. H{X) and H{Z\X) rather than H{X\Past),P[{Z\X,Past), etc. We also drop the index j 
and write X for Xi etc., so, using the new notation, we want to prove (for all integers r > s > 0) 

H{Z) < ^ (H(X) + H(Y)) + log(r + s) - - 
zr r 

Note that//(Z) = H(X,Y)+ff(ZlX,Y) = ff(X,Y)+logr(PrlX = Y])+logs{Pr[X + T]), 
so we need to prove 

{P{X) + //(T)) - P{X, Y) - logr(PrW = Y]) - log s{Pr[X / T]) + log(r + s) - - > 0 
2r r 
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Since this expression is invariant when r and s are multiplied by any positive constant we can 
set r = 1 and denote S := sjr. Next, for a joint distribution of X and Y on {0,1}^, and 
(i, j) € {0, !}"■ let Pij = Pr[X = i,Y = j] So we want to prove 

Fs ( p' V= ^ iHiX) + HiY))-HiX,Y)-log6{Pr[X / y])+log(l+(i)-d > 0(7) 
V -TOO tio / ^ 


We know (and can check directly from the formula) that equality holds when Y = y = {0,1}"" 
in which case we have 


( Poi Pii 
V Poo Pio 


2+25 2+25 \ 

1 ^ I ' 

2+25 2+25 / 


( 8 ) 


We wish to show that this is the unique minimum. To simplify notation (and save indices) we 
denote 


f Poi A 

\ Poo Pio J ' \c d )' 

and attempt to minimise Fs ^ ^ under the constraints 

a,b,c,d > 0 and a + b + c + d = l. 

Using Lagrange multipliers we deduce (after some simple cancelations) that at a local minimum 
in the interior of the region in question one must have 

-[(a + 6)(a + c)](^+^)/2 = 
a 

^[{a + b){b + d)f^^^/^ = 

^[(a + c)(c + d)](i+^)/2 = 

^-[{c + d){bYd)f^^'^/\ 

From the fact that (l9])-(Il2])=(Iin|)-(III]) we get that 

ad = 5^bc (13) 

1 into the equation (f^= (fT^ . This yields 

(14) 


Next we plug (fT^ and the restriction a + b + c + d = 


a . 


1 + ((5-2 - l)a 
l + (5-2 -l)d_ 


{l+5)/2 


(9) 

( 10 ) 

( 11 ) 

( 12 ) 
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Note that for every fixed value of 6 + c, the value of a + d is fixed, so letting d grow from 0 to 
1 — 6 — c, asa = 1 — b — c — d decreases from 1 — 6 — c to 0, we see that the left hand side of ([H]) 
is increasing and the right hand side is decreasing, hence there exists a single solution, which, 
by inspection, is a = d. 

We now know that a = d and that be = and h + c = 1 — 2a. This gives b and c as 
the roots of the quadratic equation — (1 — 2a)X + 6‘^a^ = 0. Plugging these roots into the 
equation (fT0]l = (fTn) . and using a = d yields the following equation for a: 

|l-2a + 5(a)][l-5(a)]‘+* , 

|l-2(i-S((i)][l+S(a)]i+« ’ ' ' 


where S{a) denotes a/I — 4a + 4(1 — 5‘^)a?. Now, a can take on values between 0 and 1/2, as 
long as S{a) > 0, so the relevant range is 0 < a < When a = 2 ^^, as required, then 

S{a) = 0 and equation (fTKI) clearly holds. An elementary (but slightly tedious) calculation 
shows that the left hand side of (fT^ is a decreasing function of a in the interval [0, so 

(a, b, c, d) is determined, and there is a unique internal minimum in the region which we are 
exploring. (The meticulous reader may check that the derivative of the left hand side of (IlSp 
according to a is 

16a^(d — l)d^(l + 2ad + 2ad?)[l — 5'(a)]'^ 

5(a)[-l + 2a + 5(a)]2[l + ,S(a)](2+rf) ’ 


which has a negative sign in our range, due to the term (d — 1) in the denominator. ) 


What about points on the boundary of the region? First, note that the matrix 


Pbi Til 

Too Tio 

can have either no zeros (an interior point, the case we have already covered), two zeros in the 
same row or column, or three zeroes, i.e it cannot have a single zero, or zeroes only on one of 
the diagonals. The reason is that these probabilities signify correlations between sets of vectors, 
and can be equal to zero only if one of the sets is empty. The case of three zeroes boils down to 
noting that logo(l + d) —d is non-negative for 6 G [0, ll. Let’s consider the case of two zeroes, say 

) . In this case one can check that an infini.eshnal change ( ^ 

change of ^ Alog(l/A) in the value of Fs (where the o(l) notation is as A goes to zero). 

Since this is negative there cannot be a local minimum of this form. 


induces a 


2.2 The general (non-Boolean) case 

The general case is actually a minor extension of the Boolean one, that follows by adding one 
more coordinate to each of the random variables X and Y. 

Theorem 2.3: Let e G (0,1), and let f,g : {0,1}*^ R-° . Let X be uniformly distributed on 

{0,1}", and let Y be such that for each 1 < i < n independently Pr[Xi = Yi\ = Then 

E[f{X)g{Y)] < I/I 1 +.I 5 I 1 +.. (16) 
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Proof: By continuity it suffices to consider the case where all values of / and g are rational, 
and by homogeneity, we can clear common denominators and assume that they are integer 
valued. Now we wish to prove a slight extension of ([5]), namely 

f E E < (17) 

VjvgA'YgI’ / 


n(log(r + s) 


s/r) + 


r + s 
2r 



f(X)^ 


+ log 



f(Y)^ 


(18) 


To this end we now define as before the random variables X, Y, Z and add two more in¬ 
teger random variables a and b, and take {X, Y, Z, a, b) uniformly, with {X, Y, Z) as before 
and the additional constraint that a € {1,..., f{X)}, b € {1, ■ ■ ■ ,g(Y)}. Now (fT7)l is precisely 
H{Z,a,b) = H{Z) + H{a\Z) + H{b\Z) = H{Z) + Ex[\og{f{X))] +EY[\og{g{Y))]. On the other 
hand, note that for any function t it holds that 


In particular 


H{X) + Ex[\og{t{X))]<\og 



E 

X&ox ) ) 


and 

^-^{H{Y))+EY[\og{g{Y))] < ^ |^log 

To complete the proof of theorem 12.81 we just add Ex[log{f{X))] -l-Sy [log(( 7 (y))] to both sides 
of the main inequality that we proved in the Boolean case, namely 

H{Z) < {H{X) + H{Y)) + n(log(r + s) - s/r) 

2r 



fp q p A- q 

^ -{H{X)) + Ex[log{f{X))] < 


2r 


2r 


log 


and we’re done. 


□ 

Acknowledgments: I would like to thank David Ellis and Gideon Schechtman for useful 
conversations, and the anonymous referee for pointing out an error, which, once I fixed it, led 
to a simplification of the proof. 

Added in proof: Chandra Nair alerted me to the existence of two papers, m and [B], which 
also adopt an information theoretic approach to hypercontractivity. In both of them inequalities 
similar or equivalent to ([8]) are shown to be sufficient to imply hypercontractivity. I thank him 
for pointing out these papers to me, and also for spotting some errors in an early draft of this 
paper. 
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